Cocojunk

🚀 Dive deep with CocoJunk – your destination for detailed, well-researched articles across science, technology, culture, and more. Explore knowledge that matters, explained in plain English.

Navigation: Home

Graphics processing unit

Published: Sat May 03 2025 19:14:06 GMT+0000 (Coordinated Universal Time) Last Updated: 5/3/2025, 7:14:06 PM

Read the original article here.

The Graphics Processing Unit (GPU): A Specialized Engine for Visuals and Computation

In the journey of building a computer from scratch, understanding each major component is crucial. While the Central Processing Unit (CPU) is often called the "brain," the Graphics Processing Unit (GPU) acts as a highly specialized co-processor, primarily designed to handle the demanding tasks associated with displaying images on a screen. Over time, its capabilities have expanded significantly, making it indispensable for much more than just graphics.

Graphics Processing Unit (GPU): A specialized electronic circuit designed to accelerate the creation of images intended for output to a display device. Modern GPUs are highly parallel processors capable of performing a large number of calculations simultaneously, making them useful for both graphics rendering and other computational tasks.

Unlike the CPU, which is designed for a wide range of serial tasks (executing one instruction after another), the GPU is architecturally optimized for parallel processing. This means it can perform thousands of similar operations at the same time, which is perfectly suited for graphics where millions of pixels or vertices need the same calculations applied to them.

Initially, the GPU's role was strictly focused on offloading graphics-related tasks from the CPU. As graphics became more complex, particularly with the advent of 3D rendering, the computational demands increased exponentially. The CPU, even powerful ones, would become a bottleneck trying to manage all the calculations required for realistic graphics. This led to the evolution of dedicated graphics processors, designed specifically to handle these parallel workloads efficiently.

The utility of GPUs for their parallel nature extends far beyond graphics. They have become critical components in fields like Artificial Intelligence (AI), scientific simulations, data analysis, and cryptocurrency mining, where large datasets and computationally intensive, parallelizable problems are common.

A Brief History of Graphics Hardware

The concept of specialized hardware for graphics isn't new, evolving alongside computing itself as the desire for more complex visual output grew.

Early Graphics Circuits (1970s)

Before dedicated graphics processors as we know them, early systems, particularly arcade games, used specialized circuits to assist the CPU with graphics. This was partly due to the high cost of memory required for full frame buffers. Instead of having enough memory to store the entire image pixel by pixel, these circuits helped compose graphics data on the fly as the display scanned across the screen.

Frame Buffer: A portion of computer memory that holds the complete image currently being displayed on a screen. Each pixel on the screen corresponds to data stored in the frame buffer.

Examples of early specialized hardware include:

Barrel Shifter: A circuit used in some early arcade games (like Gun Fight or Space Invaders) to rapidly shift pixel data, useful for animating graphics elements efficiently without complex CPU calculations for each pixel movement.
Tilemap Backgrounds & Sprites: Hardware support for arranging pre-designed image tiles and movable sprite objects (small, independent images) simplified game development and reduced memory needs compared to drawing everything as a large bitmap. The Namco Galaxian hardware (1979) was notable for its advanced sprite and tilemap capabilities.
Video Processors: Chips like the Television Interface Adaptor (TIA) in the Atari 2600 (1977) or ANTIC in the Atari 8-bit computers (1979) were essentially early video co-processors. ANTIC, for instance, could interpret a "display list" of instructions controlling how scan lines were drawn, enabling effects like smooth scrolling and avoiding the need for a contiguous frame buffer.

The Rise of Graphics Processors (1980s)

The 1980s saw the introduction of more integrated and capable graphics chips for personal computers.

μPD7220 (NEC, 1982): Considered the first implementation of a personal computer graphics display processor as a single large-scale integration (LSI) chip. It significantly lowered the cost and improved the performance of PC graphics cards, supporting resolutions up to 1024x1024 pixels and laying the groundwork for the PC graphics market. Intel later produced a licensed clone, the 82720.
Blitter (Amiga, 1985 & Williams Arcade Games, 1982): A "BLIT" (Block Image Transfer) processor is hardware designed to quickly move and manipulate blocks of bitmap data in memory. This was crucial for rapidly drawing windows, icons, and game graphics without burdening the main CPU. The Amiga also featured a dedicated coprocessor for manipulating graphics registers in sync with the video beam.
TMS34010 (Texas Instruments, 1986): This was the first fully programmable graphics processor. While it could run general-purpose code, it included specific instructions optimized for graphics operations. This chip became the basis for many early Windows accelerator cards under the Texas Instruments Graphics Architecture (TIGA).
IBM 8514 (1987): An early PC video card that implemented fixed-function 2D drawing primitives (like lines and rectangles) directly in hardware.
VGA (IBM, 1987): The Video Graphics Array standard became hugely influential, defining a common display mode (640x480, 16 colors or 320x200, 256 colors) that dominated PC graphics for years and served as a baseline for subsequent Super VGA (SVGA) standards developed by organizations like VESA (Video Electronics Standards Association).

Acceleration and the Advent of 3D (1990s)

The 1990s brought significant advancements, particularly in accelerating Graphical User Interfaces (GUIs) and introducing real-time 3D graphics.

2D GUI Accelerators (Early 90s): Chips like the S3 86C911 (1991) focused on accelerating common operations needed by GUI operating systems like Windows, such as drawing lines, filling areas, and moving windows. These "Windows accelerators" greatly improved the responsiveness of the desktop environment, surpassing the performance of general-purpose graphics coprocessors for these specific tasks.
Fixed-Function 3D Hardware (Mid-Late 90s): As 3D games gained popularity, hardware dedicated to accelerating specific parts of the 3D rendering pipeline emerged. Early examples were found in arcade systems (Sega Model 1/2, Namco System 22) and consoles (PlayStation, Nintendo 64). PC add-in cards like the PowerVR and 3dfx Voodoo offered powerful 3D rasterization but often relied on the existing 2D card for display output.
Hardware Transform & Lighting (T&L): A crucial step in 3D graphics is preparing vertices (points in 3D space) before drawing them. This involves translating, rotating, and scaling them (Transform) and calculating how light affects their color (Lighting). Early systems did this on the CPU. Dedicated hardware for T&L dramatically increased the complexity of 3D scenes possible in real-time. Arcade systems like the Sega Model 2 (1993) had hardware T&L years before it became common in consumer PC cards. The Nintendo 64's Reality Coprocessor (1996) was an early console example.
The Term "GPU" is Coined: Sony first used the term "GPU" (Graphics Processing Unit) in 1994 for the chip in the original PlayStation console. Nvidia popularized the term in 1999 by marketing their GeForce 256 chip as the "world's first GPU," highlighting its integrated Transform, Lighting, and rendering capabilities on a single chip.
Integrated 2D/3D Chips: As manufacturing improved, companies started integrating 2D GUI acceleration and 3D capabilities onto a single chip (like the Rendition Verite chipsets or Nvidia RIVA 128), leading to the modern all-in-one graphics card.
Graphics APIs: The development of standardized Application Programming Interfaces (APIs) like OpenGL (from Silicon Graphics, early 90s) and Direct3D (from Microsoft, 1996, part of DirectX) was critical. These APIs provided a common language for software developers to communicate with different graphics hardware, abstracting away the low-level details and fostering competition among hardware vendors to implement the API features efficiently.

API (Application Programming Interface): A set of definitions, protocols, and tools for building software and applications. In graphics, APIs like DirectX and OpenGL provide standardized functions for drawing 2D and 3D graphics that software can call, regardless of the underlying hardware manufacturer.

Programmable Shaders and GPGPU (2000s)

The early 2000s marked a pivotal shift from fixed-function graphics pipelines to programmable ones, leading to the modern GPU and the rise of general-purpose computing on GPUs.

Programmable Shaders: Nvidia's GeForce 3 (2001) was among the first consumer cards with programmable shading capabilities.
- Vertex Shaders: Small programs that run on each vertex, allowing custom manipulation of geometry (e.g., deforming a character's skin).
- Pixel/Fragment Shaders: Small programs that run on each pixel (or fragment, a potential pixel) being drawn, allowing complex calculations for color, lighting, and texture effects (like bump mapping, which simulates surface detail using texture data). This shift gave developers much more flexibility and control over the final look of graphics.
Unified Shader Model: Initially, vertex and pixel shaders were handled by separate units. The Unified Shader Model (introduced in hardware around 2006-2007) allowed different types of shaders (vertex, geometry, pixel, compute) to run on the same pool of processing cores, improving flexibility and resource utilization.
GPGPU (General-Purpose Computing on GPU): Researchers and developers realized that the parallel processing power designed for graphics could be repurposed for non-graphics tasks. Since graphics involve massive arrays of data (pixels, vertices) and parallel operations, many scientific and computational problems exhibiting similar "embarrassingly parallel" structures could be mapped onto the GPU. Early GPGPU techniques often "abused" the graphics pipeline, treating data as textures and computations as pixel shader operations.
Dedicated Compute APIs (Late 2000s): To make GPGPU more accessible and efficient, vendors developed dedicated APIs and programming models.
- CUDA (Nvidia, 2007): A proprietary parallel computing platform and API model that allows software developers to use a CUDA-enabled GPU for general purpose processing. It provides direct access to the GPU's compute capabilities without needing to frame problems as graphics tasks.
- OpenCL (Khronos Group, 2009): An open standard for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. It provides a vendor-neutral approach to parallel computing. These APIs enabled GPUs to become powerful accelerators for tasks like machine learning, scientific simulations, and financial modeling.

GPGPU (General-Purpose Computing on GPU): The use of a GPU, which is typically used for computer graphics, to perform calculations that would traditionally be handled by the central processing unit (CPU). GPUs are well-suited for this when the computation involves parallelizable tasks.

Computational Functions of a Modern GPU

A modern GPU is a highly complex processor with various units optimized for specific tasks within the graphics pipeline and general computation.

3D Graphics Pipeline Acceleration: This remains the core function. Hardware units accelerate:
- Geometry Processing: Handling vertices, including the hardware Transform and Lighting (T&L) stage.
- Rasterization: Converting 3D geometry into a 2D image made of pixels.
- Texturing: Applying images (textures) to the surfaces of 3D models.
- Shading: Applying shaders (programmable programs) to vertices and pixels to determine their final properties (color, lighting, surface effects).
- Output/Frame Buffer Operations: Writing the final pixel data to the frame buffer.
2D Graphics Acceleration: While modern GPUs often emulate older 2D acceleration using their 3D hardware, they still efficiently handle fundamental 2D operations needed for rendering user interfaces and displaying simple images.
Video Decoding and Encoding Acceleration: Modern GPUs contain dedicated hardware blocks specifically designed to accelerate the process of playing back (decoding) and creating (encoding) digital video in various formats (H.264, HEVC, etc.). This offloads a significant burden from the CPU, enabling smooth high-definition video playback on systems that might otherwise struggle.

Video Decoding/Encoding Acceleration: Dedicated hardware on the GPU that performs computationally intensive parts of compressing or decompressing video streams, such as motion compensation or inverse discrete cosine transform (iDCT). This frees up the CPU for other tasks.
General-Purpose Computing (GPGPU): As discussed, the parallel architecture makes GPUs excellent for compute workloads. Modern GPUs include dedicated execution units (often called Streaming Multiprocessors or Compute Units) specifically designed for running parallel programs written using APIs like CUDA or OpenCL.

Streaming Multiprocessor (SM) / Compute Unit (CU) / Xe Core: Blocks of processing cores within a GPU chip that execute parallel computations. Nvidia uses the term SM, AMD uses CU, and Intel uses Xe Core for their respective architectures. More SMs/CUs/Xe Cores generally mean more parallel processing power.
Specialized Cores (Modern GPUs): High-end GPUs now include cores optimized for specific non-graphics tasks:
- Tensor Cores (Nvidia): Designed to accelerate matrix multiplication operations, which are fundamental to deep learning and AI training. They significantly boost performance for these specific workloads.
- Ray Tracing Cores (Nvidia RTX, AMD RDNA 2+): Hardware units that accelerate the calculation of light rays intersecting with objects in a 3D scene, enabling more realistic lighting, reflections, and shadows than traditional rasterization techniques.

Performance Metrics: GPU performance is often measured in FLOPS (Floating-point Operations Per Second), indicating how many floating-point calculations the GPU can perform per second. Modern high-end GPUs operate at TFLOPS (TeraFLOPS - trillions of FLOPS) or even PFLOPS (PetaFLOPS - quadrillions of FLOPS). Other factors influencing performance include:

Clock Frequency: How fast the GPU cores and memory operate.
Memory Bandwidth: How quickly data can be transferred between the GPU processor and its dedicated memory (VRAM). This is a critical factor, as GPUs process massive amounts of data.
Number of Processing Cores: (SMs/CUs/Xe Cores). More cores mean more potential parallel tasks can be executed simultaneously.

GPU Forms in Computer Systems

When building or understanding a computer, you'll encounter GPUs in different forms, each with implications for performance, cost, and power consumption.

Terminology Clarification

The term "GPU" itself has evolved slightly. While Sony used it in 1994 for their console chip, Nvidia's 1999 marketing of the GeForce 256 cemented it as referring to a "single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines" – essentially the modern graphics processor. AMD used the term "Visual Processing Unit" (VPU) briefly, but "GPU" is now the dominant term.

There are two primary ways GPUs are integrated into a system:

Dedicated Graphics (Discrete GPU):

Dedicated Graphics (Discrete GPU): A graphics processing unit located on a separate, independent circuit board (often called a graphics card or video card) that plugs into a motherboard slot (like PCIe) and has its own dedicated high-speed memory (VRAM).
- Implementation: Dedicated GPUs are typically found on expansion cards (like PCIe cards) that are installed into slots on the motherboard. This modularity allows for easy upgrades or replacement.
- Memory: They have their own dedicated video RAM (VRAM), usually GDDR type memory, which is specifically designed for the high bandwidth needs of graphics processing. This memory is separate from the system's main RAM.
- Performance: Dedicated GPUs are generally significantly more powerful than integrated graphics because they have more processing cores, higher clock speeds, wider memory buses, and more VRAM. They are essential for demanding tasks like modern gaming, professional 3D rendering, video editing, and serious GPGPU workloads.
- Power & Cost: They consume more power and generate more heat than integrated solutions, requiring dedicated cooling. They also add considerable cost to the system.
- Multi-GPU Configurations: Technologies like Nvidia's SLI/NVLink or AMD's CrossFire allow multiple dedicated GPUs to work together on a single task, though support varies depending on the application (primarily used in high-end workstations, servers, and supercomputers for compute tasks rather than consumer gaming today).
Integrated Graphics (Integrated Graphics Processor - IGP):

Integrated Graphics (Integrated Graphics Processor - IGP): A graphics processing unit that is physically located on the same die as the Central Processing Unit (CPU) or as part of the motherboard's chipset. It shares the system's main RAM for graphics processing.
- Implementation: Integrated graphics are built into the CPU itself (e.g., Intel HD/UHD/Iris Graphics, AMD APU graphics) or less commonly, as part of the motherboard's chipset.
- Memory: They do not have dedicated VRAM. Instead, they "borrow" or share a portion of the system's main RAM (DDR memory).
- Performance: IGPs are generally less powerful than dedicated GPUs. Their performance is limited by sharing system RAM (which has higher latency and often lower effective bandwidth compared to VRAM) and typically having fewer processing cores and less power available.
- Use Case: They are suitable for basic tasks like displaying the operating system desktop, web browsing, video playback, and less graphically intensive applications or older games. They are common in laptops, entry-level desktops, and systems where cost, power efficiency, and size are prioritized over maximum graphics performance.
- Unified Memory Architecture (UMA): In modern systems where the IGP is on the same die as the CPU (like AMD APUs or Intel Core processors), the CPU and GPU share the same pool of physical RAM. This is known as UMA. This allows for flexible memory allocation between the CPU and GPU and enables efficient "zero-copy" data transfer between them, as they can directly access each other's data in memory without physically moving it. However, the shared memory pool and limited bandwidth to that pool remain performance constraints compared to dedicated VRAM.

Other GPU Forms

Hybrid Graphics: Less common now, this combined integrated graphics with a very low-end dedicated GPU (often on the motherboard or integrated but with a small dedicated cache) to offer slightly better performance than integrated alone while remaining cost-effective. Technologies like ATI HyperMemory and Nvidia TurboCache used a small amount of dedicated memory in conjunction with shared system memory.
External GPU (eGPU):

External GPU (eGPU): A dedicated graphics processing unit housed in an external enclosure that connects to a computer (typically a laptop) via a high-speed external interface like Thunderbolt or OCuLink.
- Use Case: Allows laptops (which often only have integrated or lower-power dedicated graphics) to utilize a full-power desktop-class graphics card for demanding tasks when connected, effectively turning a portable machine into a more capable workstation or gaming system when at a desk.
- Connection: Requires a high-bandwidth external connection to the motherboard (like Thunderbolt 3/4 or OCuLink), as transferring data to and from the GPU externally needs significant speed.
- Requirement: The external enclosure requires its own power supply to power the high-wattage desktop GPU.

GPU Companies

The market for GPUs has seen many players over the years. Today, the personal computer and workstation market is dominated by three main companies:

Nvidia: A long-standing leader, particularly in high-performance gaming (GeForce series) and professional/AI computing (RTX, previously Quadro and Tesla series). Coined the term "GPU" in its modern sense.
AMD: The primary competitor to Nvidia, offering Radeon series GPUs for gaming and consumer use, and Radeon Pro/Instinct series for professional and data center applications. Also significant in the console market (supplying chips for PlayStation and Xbox).
Intel: Historically dominant in integrated graphics (Intel Graphics Technology). Re-entered the dedicated GPU market more recently with their Arc series.

Other companies produce GPUs, particularly for mobile devices (smartphones, tablets):

Qualcomm: Adreno GPUs (commonly found in Android phones).
ARM: Mali GPUs (widely licensed for mobile System-on-Chips).
Imagination Technologies: PowerVR GPUs (historically used in Apple devices and others).
Matrox: Focuses on specialized multi-display and professional graphics solutions.
Jingjia Micro: A notable Chinese domestic GPU producer.

Conclusion

The GPU has evolved from simple graphics circuits assisting the CPU to become a powerful, highly parallel processor essential for modern computing. Whether integrated for basic display needs or dedicated for high-performance graphics and computation, it plays a vital role in delivering the visual experiences and accelerating complex workloads that define today's computer systems. Understanding its history, architecture, and different forms is key to appreciating its place alongside the CPU, memory, and other components in the intricate design of a computer.